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IN THE CLAIMS : 

Please amend the claims as follows: 
L (Currently Amended) A method for clustering data points with defined quantified 
rolationohipo rel ation values between them comprising: 

obtaining a lead value for each data point, wherein said lead value for each data point is 
derived from any of said quantifi e d r e lationships and as giv e n input Hy taking a sum of all 
relation values input into said data point plus a frequency associated with said data point 

ranking each data point in a lead value sequence list in descending order of lead value, 
assigning a first data point in said lead value sequence list as a leader of a first cluster, 
considering each subsequent data point in said lead value sequence list as a leader of a 
new cluster if its relationship with leaders of each of the previous clusters is less than a defined 
threshold value or as a member of at least one cluster where its relationship with a cluster leader 
is at least equal to said threshold value , wherein the threshold value is adaptivelv found for a 
given number of clusters, and 

generating a text summarization of any of a single document and a collection of 
documents based on said clu s tering of data points bv segmenting a given text input comprising 
said data points into clusters, and forming a set of leaders of said clusters to represent said text 
summarization . 

2. (Previously Presented) The method of claim 1 , wherein said quantified relationships 
between data points are any of symmetric and asymmetric quantified relationships. 

09/815,616 3 
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3 . (Currently Amended) The method of claim 1 , wherein th e l e ad value of each data point is 
det e rmin e d by taking a sum of r e lation valuoo of each of other data points to said data poin t said 
frequency equals one . 

4. (Currently Amended) The method of claim 1 1 wh e r e in said threshold value is adoptiv e ly 
fmmH fnr n givnn mifflhor of clusters further co m prising identifying distinct data points using 
gaid lead values and said relation values between said data points . 

5. (Previously Presented) The method of claim 1, further comprising organizing a set of data 
points into a hierarchy of clusters by clustering the data points into sets of small sizes, wherein 
each smaller set is further subclustered; and repeatedly subclustering said smaller set until a 
terminating condition is reached. 

6. (Currently Amended) The method of claim 1 , wherein said step of generating further 
comprises: 

segmenting a given input text inlo blocks comprising sentences, a collection of sentences, 
and paragraphs, 

excluding words belonging to a defined list of defined stop words, 
replacing words by their existing unique synonymous word from a given a collection of 
synonyms, 

applying stcnuning algorithms for mapping words to root words, 
09/815,616 4 
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representing resulting blocks of text, with respect to a dictionary which is either given or 
computed from the input text, by a binary vector of size equal to the number of words in the 
dictionary whose ilh dement is 1 if an ith word in the dictionary is present in the block, 

computing the relationship between any data points di and dj by evaluating R(dt,dj) = 
|di.Tdi[/|di|, wherein T is a thesaurus matrix whose ijlh element reflects an extent of inclusion of 
meaning of jth word in the meaning of ith word, and 

clustering the data points wherein the lead valu e of each data point is determined by 
taking a sum of relation valuoo of each of oth e r data points to paid data point, wher e in thr e shold 
volu o h nd n p tivnly frmnrl for t» r TTfl,r> nf nliint R ra, find wherein a aet of loadpra of reoultina 

dusters summarize a t;ivon text . 

7. (Previously Presented) The method of claim 6, wherein said dictionary is computed by 
taking a fraction of words, excluding said stop words, with a highest tfidf value, which is given 
by: 

tfidf(wi) = tfi * log(N/dfi), 

where tfidf(wi) is the lead value of data point wi, tfi = a number of times the data point wi 
occurred in a whole text, dfi a number of documents containing the data point wi and N - the a 
total number of documents in the text. 

8. (Previously Presented) The method of claim 6, wherein said thesaurus matrix comprises 
any of a given identity matrix, and a computed matrix from a collection of documents. 

09/815,616 * 
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9. (Previously Presented) The method of claim 6, wherein each block is represented by a 
vector whose ith element represents a frequency of occurrence of said ith word in the block. 

10. (Previously Presented) The method of claim 6, further comprising organizing a set of text 
documents into a hierarchy of clusters by clustering given documents into sets of small sizes, 
wherein each smaller set is further subclustcred; and repeatedly subclustering said smaller set 
until a terminating condition is reached. 

1 1 . (Previously Presented) The method of claim 1 0, further comprising organizing results 
returned by an information retrieval system in response to an user query into an hierarchy of 
clusters. 

1 2. (Previously Presented) The method of claim 1 1 , wherein the hierarchy is used to aid the 
user in any of modifying a query of said user and browsing through said results. 

1 3. (Previously Presented) The method of claim 11, wherein said information retrieval 
system comprises a search engine retrieving Web documents. 

14. (Previously Presented) The method of claim 5, wherein said step of generating is applied 
to vocabulary organization for a group of documents wherein the data points are words in a 
dictionary of the vocabulary, wherein the lead value of a word is any of its frequency of 
occurrence in the collation of documents, a number of documents containing the word, and a 

09/815,616 6 
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tfidf value of said word, wherein a relationship R(di,dj) denotes a fraction of documents 
containing a jth word that also contains an ith word, and the clustering of said data points results 
in a structured hierarchical organization of the vocabulary. 

15. (Previously Presented) The method of claim 14, wherein a structured vocabulary is used 
to provide text summarization for associated documents. 

16. (Previously Presented) The method of claim 14 t further comprising applying the 
clustering to customer profiling wherein a dictionary is built and the vocabulary is organized 
using documents that lire viewed by a customer. 

1 7. (Previously Presented) The method of claim 5, wherein said data points correspond to 
products cataloged in an electronic store, the lead value of a product is its per unit profit, its per 
unit value or a number of items sold per unit lime, and a relationship between the products is 
either explicitly defined or derived from purchase data. 

18. (Previously Presented) The method of claim 17, wherein a product di is related to a 
product dj by the a fraction of customer transactions containing dj that also contain di. 

1 9. (Previously Presented) The method of claim 17, further comprising applying the 
clustering to any of to an analysis of sales of a store for a merchant, and an organization of a 
layout of the store to facilitate easy access to products. 

09/815,616 7 
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20. (Previously Presented) The method of claim 1 7, further comprising applying the 
clustering to personalize an electronic store layout to an individual customer by using a 
relationship that is specific to the individual customer. 

2 1 . (Previously Presented) The method of claim 5, further comprising applying the clustering 
to customer segmentation for a sales or service organization wherein the data points comprise 
customers in a database, wherein the lead values ore any of a total purchase amount per unit time 
of said customers, income of said customers, a number of times customers visited an electronic 
store, and a number of items bought by the customer, wherein a relationship between customers 
is either explicitly defined or derived from some relevant data, with a resulting clustering 
reflecting a structured grouping of customers with similar performances. 

22. (Previously Presented) The method of claim 21 , wherein a customer di is related to a 
customer dj by a fraction of products bought by dj that are also bought by di. 

23. (Currendy Amended) A system for clustering data points with defined quantified 
re lationships relation values between them, said system comprising: 

means for obtaining a lead value for each data point, wherein said lead value for each 
data point is derived from any of said quantified r e lationship s and ao given input by taking a sum 
of all relation values iaput into said data point plus a frequency associated with said data point , 

09/815,616 8 
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means for ranking each data point in a lead value sequence list in descending order of 
lead value, 

means for assigning a first data point in said lead value sequence list as a leader of a first 

cluster, 

means for considering each subsequent data point in said lead value sequence list as a 
leader of a new cluster if its relationship with leaders of each of the previous clusters is less than 
a defined threshold value or as a member of at least one cluster where its relationship with a 
cluster leader is at least equal to said threshold valu e, wherein the threshold value is adaptively 
found for a given number of clusters, and 

means for generating a text summarization of any of a single document and a collection 
of documents based on said oluot e ring of data points by segmenting a given text input 
comprising said data points into clusters, and forming a set of leaders of said clusters to represent 

24. (Previously Presented) The system of claim 23, wherein said quantified relationships 
between data points are any of symmetric and asymmetric quantified relationships. 

25. (Currently Amended) The system of claim 23, wherein th e m e an s for obtaining th e l ead 
valuo of oaoh dntn point is d e t e rmin e d by taking the a s um of relation values of oaoh of othor data 
points to s aid dntap siafc said frequency equals one . 



09/815,616 
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26. (Currently Amended) The system of claim 23, whoroin onid threohold value is odap&vety 
found fnr rt givtm number nf plasters ftuther comprising means for identifying distinct data points 
using said lead values and said relation values between said data points . 

27. (Previously Presented) The system of claim 23, further comprising means for organizing 
a set of data points into a hierarchy of clusters using means for clustering the data points into sets 
of small sizes, wherein each smaller set is further subclustered; and repeatedly subclustering said 
smaller set until a terminating condition is reached 

28. (Currently Amended) The system of claim 23, wherein said means for generating further 
comprises: 

means for segmenting a given input lexi into blocks comprising sentences, a collection of 

sentences, and paragraphs, 

means for excluding words belonging to a defined list of defined stop words, 
means for replacing words by their existing unique synonymous word from a given a 

collection of synonyms, 

means for applying stemming algorithms for mapping words to root words, 
means for representing resulting blocks of text, with respect to a dictionary which is 

either given or computed from the input text, by a binary vector of size equal to the number of 

words in the dictionary whose ith element is 1 if an ith word in the dictionary is present in the 

block, 

09/815,616 10 
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means for computing the relationship between any data points di and dj by evaluating 
R(di,dj) = |di.Tdi|/|di|, wherein T is a thesaurus matrix whose ijth element reflects an extent of 
inclusion of meaning cf jth word in the meaning or ith word, and 

means for clustering the data points whoroin the load valu e of oaoh data point i s 
determined by taking a sum of rolation valuca of oaoh of other data points to said data point, 
whorcin the t hr oohold vn1n ft i" n^r*"™ 1 ? ^imH for q given number of clusters, and wh e r e in a s e t 
of loaders of roaulting oluotors summariz e a given text 

29. (Previously Presented) The system of claim 28, wherein said dictionary is computed by 
taking a fraction of words, excluding said stop words, with a highest tfidf value, which is given 
by: 

tfidf(wi)-tfi*log(N/d(i) 5 

where tfidf(wi) is the lead value of data point wi, tfi - a number of times the data point wi 
occurred in a whole text, dfi = a number of documents containing the data point wi and N - a 
total number of documents in the text. 

30. (Previously Presented) The system of claim 28, wherein said thesaurus matrix comprises 
any of a given identity matrix, and a computed matrix from a collection of documents. 

3 1 . (Previously Presented) The system of claim 28, wherein each block is represented by a 
vector whose ith element represents a frequency of occurrence of said ith word in the block. 

09/815,616 11 
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32. (Previously Presented) The system of claim 28, "further comprising means for organizing 
a set of text documents into a hierarchy of clusters by using means for clustering given 
documents into sets of small sizes, wherein each smaller set is further subclustercd; and 
repeatedly subclustering said smaller set until a terminating condition is reached. 

33. (Previously Presented) The system of claim 32, further comprising means for organizing 
results returned by an information retrieval system in response to an user query into an hierarchy 
of clusters. 

34. (Previously Presented) The system of claim 33, wherein the hierarchy is used to aid the 
user in any of modifying a query of said user and browsing through said results. 

35. (Previously Presented) The system of claim 33, wherein said information retrieval system 
comprises a search engine retrieving Web documents. 

36. (Previously Presented) The system of claim 27, wherein said means for generating is used 
for vocabulary organization for a group of documents wherein the data points are words in a 
dictionary of the vocabulary, wherein the lead value of a word is any of its frequency of 
occurrence in the collection of documents, a number of documents containing the word, and a 
tfidf value of said woid, wherein a relationship R(di,dj) denotes the a fraction of documents 
containing a jth word that also contains an ilh word, and the clustering of said data points results 
in a structured hierarchical organization of the vocabulary. 

09/815,616 12 
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37. (Previously Presented) The system of claim 36, wherein a structured vocabulary is used 
to provide text summarization for associated documents. 

38. (Previously Presented) The system of claim 36, further comprising means for using the 
clustering for customer profiling wherein a dictionary is built and the vocabulary is organized 
using documents that .ire viewed by a customer, 

39. (Previously Presented) The system of claim 27, wherein said data points correspond to 
products cataloged in an electronic store, the lead value of a product is its per unit profit, its per 
unit value or a number of items sold per unit time, and a relationship between the products is 
either explicitly defined or derived from purchase data. 

40. (Previously Presented) The system ofclatm 39, wherein a product di is related to a 
product dj by a fraction of customer transactions containing dj that also contain di. 

4 1 . (Previously Presented) The system of claim 39, further comprising means for applying 
the clustering to any of an analysis of sales of a store for a merchant, and an organization of a 
layout of the store to facilitate easy access to products. 



09/815,616 13 
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42. (Previously Presented) The system of claim 39, further comprising means for applying 
the clustering to personalize an electronic store layout to an individual customer by using a 
relationship thai is specific to the individual customer. 

43. (Previously Presented) The system of claim 27, further comprising means for applying 
the clustering for customer segmentation for a sales or service organization wherein the data 
points comprise customers in a database, wherein the lead values are any of a total purchase 
amount per unit time of said customers, income of said customers, a number of times customers 
visited an electronic store, and a number of items bought by the customer, wherein a relationship 
between customers is either explicitly defined or derived from some relevant data, with a 
resulting clustering reflecting a structured grouping of customers with similar performances, 

44. (Previously Presented) The system of claim 43, wherein a customer di is related to a 
customer dj by a fraction of products bought by dj that are also bought by di. 

45. (Currently Amended) A computer program product comprising computer readable 
program code stored cm computer readable storage medium embodied therein for clustering data 
points with defined quantified relationships relation values between them, comprising: 

computer readable program code means for obtaining a lead value for each data point, 
wherein said lead value for each data point is derived from ony of said quantifi e d relationohips 
and ns giv e n input bv taking a sum of all relation values input into said data point plus a 
frggymCY respited with said data point, 

09/815,616 14 
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computer readible program code means for ranking each data point in a lead value 
sequence list in descending order of lead value, 

computer readible program code means for assigning a first data point in said lead value 
sequence list as a leader of a first cluster, 

computer readable program code means for considering each subsequent data point in 
said lead value sequence list as a leader of a new cluster if its relationship with leaders of each of 
the previous clusters is less than a defined threshold value or as a member of at least one cluster 
where its relationship with a cluster leader is at least equal to said threshold valu e* wherein the 
threshold value is a^ ?r>^ Vftl Y f oufld for a given number of clusters, and 

computer readable program code means for generating a text summarization of any of a 
single document and a collection of documents baa e d on said clustering of data pointa by 
se flTnentinp a friven test input comprising said data points into clusters, and forming a set of 
leaders of said clusters to represent said text summarization . 

46. (Previously Presented) The computer program product of claim 45, wherein said 
quantified relationships between data points arc any of symmetric and asymmetric quantified 
relationships. 

47. (Currently Amended) The computer program product of claim 45, wherein said computer 
readable program cod e m e ans is configured for obtaining th e l e nd valu e of each data point is 
determin e d by taking a s um of relation values of each of other data points to guid data point 
frequency equals one . 

09/815,616 15 



PAGE 15/26 ' RCVD AT 5/28/20M 11:35:15 AM [Eastern Daylight Time] 1 SVfcUSPTO-EFXRF-lfD 1 DNIS:87293(t6 * CSID:301 261 8825 * DURATION (mm-ss):DS-26 



SENT BY: MCGINN& QIBB; 



301 261 8825 ; 



MAY-28-04 10:45; 



PAGE 16 



48. (Currently Amended) The computer program product of claim 45, wh e r e in said thresho l d 
vnh *» fldjip i imriy found for a givan numb e r of clusters further comprising computer readable 
program code means for identifying distinct data points using said lead values and said relation 
values between said data points . 

49. (Previously Presented) The computer program product of claim 45, further comprising 
computer readable program code means configured for organizing a set of data points into a 
hierarchy of clusters using computer readable program code means configured for clustering the 
data points into sets of small sizes, wherein each smaller set is further subclustered; and 
repeatedly subclusterhig said smaller set until a terminating condition is reached, 

50. (Currently Amended) The computer program product of claim 45, wherein said computer 
readable program code means configured for generating further comprises; 

computer readable program code means configured for segmenting a given input text into 
blocks comprising sentences, a collection of sentences, and paragraphs, 

computer readable program code means configured for excluding words belonging to a 
defined list of defined stop words, 

computer readable program code means configured for replacing words by their existing 
unique synonymous word, if it exists, from a given a collection of synonyms, 

computer readable program code means configured for applying stemming algorithms for 
mapping words to root words, 

09/815,616 16 
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computer readable program code means configured for representing resulting blocks of 
text, with respect to a dictionary which is either given or computed from the input text, by a 
binary vector of size equal to the number of words in the dictionary whose ith element is 1 if an 
ith word in the dictionary is present in the block, 

computer readable program code means configured for computing the relationship 
between any data points di and dj by evaluating R(di,dj) - |*.Tdi|/|di|, wherein T is a thesaurus 
matrix whose ijth element reflects an extent of inclusion of meaning of jth word in the meaning 
of ith word, and 

computer readable program code means configured for clustering the data points whereifi 
th o l ea d va lu e o f eac h dnt n p oin t i n rirtwiriTr" 1 *y fnlri T " ""™ nf rc1ation values of onon of oth e r 
data p o int: t o jfiid flnt r i p oint. vrtirrrin t h " ""i"" " nHnptivolv found for a giv e n numbef 

of cluotcra, and wherein a sot of l e aders of r e sulting ohigterr. summarize a given text . 

5 1 . (Previously Presented) The computer program product of claim 50, wherein said 
dictionary is computed by taking a fraction of words, excluding said stop words, with a highest 
tfidf value, which is given by: 
tttdf(wi) = tfi *log(N/dfi), 

where tfidf(wi) is the lead value or data point wi, tfi - a number of times the data point wi 
occurred in a whole text, dti - a number of documents containing the data point wi and N = a 
total number of documents in the text. 
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52. (Previously Presented) The computer program product of claim 50, wherein said 
thesaurus matrix compri ses any of a given identity matrix, and a computed matrix from a 
collection of documents. 

53. (Previously Presented) The computer program product of claim 50, wherein each block is 
represented by a vector whose ith element represents a frequency of occurrence of said ith word 
in the block. 

54. (Previously Presented) The computer program product of claim 50, further comprising 
computer readable program code means configured for organizing a set of text documents into a 
hierarchy of clusters by using computer readable program code means configured for clustering 
given documents into sets of small sizes, wherein each smaller set is further subclustered; and 
repeatedly subclustering said smaller set until a terminating condition is reached. 

55. (Previously Presented) The computer program product of claim 54, further comprising 
computer readable program code means configured for organizing results returned by an 
information retrieval system in response to on user query into an hierarchy of clusters. 

56. (Previously Presented) The computer program product of claim 55, wherein the hierarchy 
is used to aid the user in any of modifying a query of said user and browsing through said results. 
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57. (Previously Presented) The computer program product of claim 55> wherein said 
information retrieval system comprises a search engine retrieving Web documents. 

58. (Previously Presented) The computer program product of claim 49, wherein said 
computer readable program code means configured for generating is used for vocabulary 
organization for a group of documents wherein the data points are words in a dictionary of the 
vocabulary, wherein the lead value of a word is any of its frequency of occurrence in the 
collection of documents, a number of documents containing the word, and a tfidf value of said 
word, wherein a relationship R(di,dj) denotes a fraction of documents containing a jth word that 
also contains an ith word, and the clustering of said data points results in a structured hierarchical 
organization of the vocabulary. 

59. (Previously Presented) The computer program of claim 58, wherein a structured 
vocabulary is used to provide text summarization for associated documents. 

60. (Previously Presented) The computer program product of claim 58, further comprising 
computer readable program code means configured for using the clustering for customer 
profiling wherein a dictionary is built and the vocabulary is organized using documents that are 
viewed by a customer 

61 * (Previously Presented) The computer program product of claim 49 t wherein said data 
points correspond to products cataloged in an electronic store, the lead value of a product is its 

09/815,616 19 
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per unit profit, its per unit value or a number of items sold per unit time, and a relationship 
between the products ii either explicitly defined or derived from purchase data. 

62. (Previously Presented) The computer program product of claim 6 1 , wherein a product di 
is related to a product dj by a fraction of customer transactions containing dj that also contain di. 

63 . (Previously Presented) The computer program product of claim 6 1 , further comprising 
computer readable program code means configured for applying the clustering to any of an 
analysis of sales of a store for a merchant, and an organization of a layout of the store to facilitate 
easy access to products. 

64. (Previously Presented) The computer program product of claim 61 > further comprising 
computer readable program code means configured for applying the clustering to personalize an 
electronic store layout to an individual customer by using a relationship that is specific to the 
individual customer. 

65. (Previously Presented) The computer program product of claim 49, further comprising 
computer readable program code means configured for applying the clustering for customer 
segmentation for a sales or service organization wherein the data points comprise customers in a 
database, wherein the lead values are any of a total purchase amount per unit time of said 
customers, income of said customers, a number of times customers visited an electronic store, 
and a number of items bought by the customer, wherein a relationship between customers is 
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either explicitly defined or derived from some relevant data, with a resulting clustering reflecting 
a structured grouping of customers with similar performances. 

66. (Previously Presented) The computer program product of claim 65, wherein a customer 
di is related to a customer dj by a fraction of products bought by dj that are also bought by di. 
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